Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation
نویسندگان
چکیده
When learning a dependence from data, to avoid overfitting, it is important to divide the data into the training set and the testing set. We first train our model on the training set, and then we use the data from the testing set to gauge the accuracy of the resulting model. Empirical studies show that the best results are obtained if we use 20-30% of the data for testing, and the remaining 70-80% of the data for training. In this paper, we provide a possible explanation for this empirical result. 1 Formulation of the Problem Training a model: a general problem. In many practical situations, we have a model for a physical phenomenon, a model that includes several unknown parameters. These parameters need to be determined from the known observations; this determination is known as training the model. Need to divide data into training set and testing set. In statistics in general, the more data points we use, the more accurate are the resulting estimates. From this viewpoint, it may seem that the best way to determine the parameters of the model is to use all the available data points in this determination. This is indeed a good idea if we are absolutely certain that our model adequately describes the corresponding phenomenon. In practice, however, we are often not absolutely sure that the current model is indeed adequate. In such situations, if we simply use all the available data to determine the parameters of the model, we often get overfitting – when the model describes all the data perfectly well without being actually adequate. For
منابع مشابه
Explanation of Harry Broudy’s View with Respect to Aesthetic Education and its Link to Education via Pedagogical Theater
Pedagogical theater with a continuous process and with an emphasis on the simple learning of various concepts and lessons assists to grow and thus enhance individual and group behavior in society. The process of performing the exercises encourages the talent and creativity of the participants in learning and ensures their active participation. This study aims to establish a link between the ide...
متن کاملEvaluation of “Mosaic 1 Reading”: A Microstructural Approach to Textual Analysis of Pedagogical Materials
To analyze and evaluate textbooks, researchers have either proposed scales and checklists to be filled by teachers and learners or conducted qualitative investigations of the match between SLA theories and textbook activities. This study, however, employs the microstructural approach of schema theory to scrutinize the reading passages of “Mosaic 1 Reading”. To this end, 17 passages of the textb...
متن کاملبررسی وضعیت نگهداری و هزینه مراقبت از تجهیزات پزشکی بیمارستانهای وابسته به دانشگاه علوم پزشکی و خدمات بهداشتی ـ درمانی ایران در سال (2000-2001)
Medical equipment play an important role in the diagnosis, treatment and education of medical affairs. Each year a noteworthy volume of medical sets and equipments in hospitals become obsolete due to technical defects and lose their efficiency in a way that major parts of hospitals’ expenses are allocated for the supply, service and repair of this equipment. The outcome of such expenses a...
متن کاملStudy of the relation between NCPI and CACO indices with autumn precipitation of Southern Coast of Caspian Sea
In this research, the relationship between NCPI and CACO indices with autumn precipitation of Southern Coast of Caspian Sea (SCCS) was investigated. In this regard, two sets of data were used (Aphrodite and Station). And the days with more rainfall than long-term average rainfall station and on condition that the rainfall is more than 70% of the region rainfall, were chosen as a day of widespre...
متن کاملCompaison the effect of resistance exercise with active and passive rest on aerobic and anaerobic fitness in soccer players
The aim of this study was to Compare the effect of resistance training with active and passive rest on aerobic and anaerobic fitness in soccer players. For this purpose 16 players of Foolad Mobarakeh Sepahan club, accidentally were divided into rest and active group (Height 177.78±6.68 cm, Weight 64.88±7.97 kg, age 18.22±0.83 years ,and percent fat 20.47±1.8) and rest passive group(Height 17...
متن کامل